Skip to content

feat: wire NER entity extraction into Lithoglyph importer#32

Merged
hyperpolymath merged 1 commit into
mainfrom
feature/entity-resolution-wiring
Mar 13, 2026
Merged

feat: wire NER entity extraction into Lithoglyph importer#32
hyperpolymath merged 1 commit into
mainfrom
feature/entity-resolution-wiring

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

  • Adds regex-based NER extraction module (NERExtractor) with 3 strategies: titled names, organisation suffixes, capitalised multi-word sequences
  • Wires entity extraction into the Lithoglyph importer pipeline — after evidence creation, entities are extracted from content_text, resolved via Entities.resolve_ner_output/2 (exact match → fuzzy Jaro-Winkler → auto-create), and linked with :mentions relationship edges
  • Extends the Relationship schema to support :entity node type and :mentions edge type
  • Updates graph traversal helpers (get_node_relationships, find_path, parse_nodes) for entity nodes

Test plan

  • 13 NER extractor unit tests passing (titled names, orgs, capitalised sequences, stopword filtering, dedup, complex content, string/atom key handling)
  • Integration test with running ArangoDB (entity resolution + edge creation)
  • Verify importer handles NER failures gracefully (best-effort, logged, non-blocking)

🤖 Generated with Claude Code

After evidence is imported from Lithoglyph, extract named entities from
content_text using regex-based NER (titled names, org suffixes, capitalised
sequences), resolve them against existing entities via Entities.resolve_ner_output
(exact → fuzzy → auto-create), and create :mentions relationship edges in
ArangoDB. Entity linking is best-effort — failures are logged but don't block
the import.

- Add NERExtractor module with 3 extraction strategies
- Wire NER into Importer.import_single_record post-create step
- Extend Relationship schema with :entity type and :mentions edges
- Update graph traversal helpers to handle entity nodes
- Add 13 unit tests for NER extraction (all passing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hyperpolymath hyperpolymath merged commit 12b354f into main Mar 13, 2026
14 of 16 checks passed
@hyperpolymath hyperpolymath deleted the feature/entity-resolution-wiring branch March 13, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant